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Abstract. The proximal point algorithm, which is a well-known tool for find- 
ing minima of convex functions, is generalized from the classical Hilbert space 
framework into a nonlinear setting, namely, geodesic metric spaces of nonpos- 
itive curvature. We prove that the sequence generated by the proximal point 
algorithm weakly converges to a minimizer, and also discuss a related question: 
convergence of the gradient flow. 



1. Introduction 

The proximal point algorithm (PPA) is a method for finding a minimizer of 
a convex lower semicontinuous (shortly, lsc) function defined on a Hilbert space. 
Its origin goes back to Martinet, Rockafellar, and Brezis&Lions [3, [22l [25] . The 
algorithm has since become extremely popular among practitioners in optimization, 
and also offered many challenging mathematical problems. For instance, Rockafel- 
lar's 1976 question 25! as to whether or not the PPA always converges strongly 
was settled (in the negative) as late as 1990s by Guler [T3]. The literature on the 
subject has become too extensive to be even partially listed here. We believe the 
interested reader will easily find further information on this field. 

Gradually, many of the algorithms for solving optimization problems have been 
generalized from linear spaces (Euclidean, Hilbert, Banach) into differentiable man- 
ifolds. In particular, the proximal point algorithm in the context of Riemannian 
manifold (of nonpositive sectional curvature) was studied in [TH [U3 [H] . We con- 
tinue along these lines and introduce the PPA into geodesic metric spaces of nonpos- 
itive curvature, so-called CAT(O) spaces. Since minimizers of convex lsc functionals 
in these spaces play an important role in analysis and geometry (see, for instance, 
Sections 11.21 and II. 3p . we dare to believe that the PPA will prove useful. Also, we 
should like to mention Zaslavski's very recent paper |27j with a different approach 
to the PPA in metric spaces. 

The aim of this paper is to introduce the proximal point algorithm into metric 
spaces of nonpositive curvature and show weak convergence of this algorithm. 

There are of course natural obstacles one has to overcome in CAT(O) spaces. 
Unlike Riemannian manifolds, CAT(O) spaces do not come equipped with a Rie- 
mannian metric, and, probably relatedly, we do not have a notion of a subgradient 
of a convex function. The proof of weak convergence of the PPA in Hilbert spaces, 
on the other hand, does use both the inner product and the convex subgradi- 
ent [13l [22l [25] i and therefore we cannot simply translate the existing proof into 
the context of metric spaces. Furthermore, a general CAT(O) space is not locally 
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compact, which does not in general allow to prove strong convergence of the PPA, 
and forces us to manage with the weak convergence. 

The results of the present paper can be of an interest also in Hilbert spaces, as 
they show that the PPA as well as the gradient flow semigroup are purely metric 
objects in spite of their linear origins. 

1.1. Proximal point algorithm. Let H be a Hilbert space and / : H — > (—00, 00] 
be a convex lsc function which attains its minimum on H. The proximal point 
algorithm seeks a minimizer of / by successive approximations 



(1) 



x n = argmm 



1 

2A„ 



\\V ~ x n-l\ 



n G N, 



where £0 G H is a given starting point, and A rl > for all n G N. The sequence 
(x n ) is known to converge weakly to a minimizer of /, provided ^ n = 00 > 
see |T3l [221 [25] . A natural question, posed by Rockafellar in [25] , whether this 
convergence can be improved to strong was answered in the negative by Giiler 
[TBI Corollary 5.1]. In other words, weak convergence is the best we can achieve 
without additional assumptions. It is worth mentioning that counterexamples to 
strong convergence of the PPA are still very rare [H H] ■ 

1.2. CAT(O) spaces. Geodesic metric spaces of nonpositive curvature in the sense 
of Alexandrov, that is, CAT(O) spaces in Gromov's terminology, include Hilbert 
spaces, K-tree, Euclidean Bruhat-Tits buildings, complete simply connected Rie- 
mannian manifolds of nonpositive sectional curvature, and many other important 
spaces included in none of the above classes. 

There are several equivalent conditions for a geodesic metric space (A, d) to be 
CAT(O), one of them is the following inequality, which is to be satisfied for any 
x G X, any geodesic 7 : [a, b] — > X, and any t G [0, 1] : 

(2) d (x, 7 (t)) 2 < (1 - t)d (x, 7(a)) 2 + td (x, 7 (6)) 2 - t(l - t)d h(a) rf (b)) 2 . 

Convex functions on CAT(O) spaces are our principal object of interest in this 
paper. Recall that a function / : C —> (—00,00], defined on a convex subset C of 
a CAT(O) space, is convex if, for any geodesic 7 : [0, 1] — > C, the function / o 7 is 
convex. Here we collect several important examples [5]. 

Example 1.1 (Distance functions). The function 

(3) x m> d (x, xq) , x G X, 

where xo is a fixed point of X, is convex and continuous. The square of this function 
is even strictly convex. More generally, the distance function dc to a closed convex 
subset C C X, defined as 



dc(x) = inf d(x, c) 



is convex and 1-Lipschitz 



X G X, 
Proposition 2.4, p. 176]. 



Example 1.2 (Displacement functions). Let T : X — !> X be an isometry. The 
displacement function of T is the function St ■ X — > [0, 00) defined by 

St(x) — d(x, Tx), 

for all x G X. It is convex and Lipschitz [8l Definition II.6.1]. 
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Example 1.3 (Busemann functions). Let c : [0,oo) — > X be a geodesic ray. The 
function b c : X ~ > K defined by 

b c (x) = lim \d(x,c(t)) —t], x G X, 

t— >oo 

is called the Busemann function associated to the ray c, see [HI Definition II.8.7]. 
Busemann functions are convex and 1-Lipschitz. Concrete examples of Busemann 
functions are given in [8j p. 273] . Another explicit example of a Busemann function 
in the CAT(O) space of positive definite nxn matrices with real entries is found in [5J 
Proposition 10.69]. The sublevel sets of Busemann functions are called horoballs 
and carry a lot of information about the geometry of the space in question, see [8] 
and the references therein. 

The energy functional is another important instance of a convex function on a 
CAT(O) space, see (TSJ Chapter 7], or more generally in 16, Chapter 4]. Mini- 
mizers of the energy functional are called harmonic maps, and are of an immense 
importance in both geometry and analysis. 

1.3. Resolvents and semigroups. Let (X, d) be a complete CAT(O) space, and 
/ : X — > (— oo, oo] be lsc convex. For A > 0, define the Moreau-Yosida resolvent of 
/ as 



f(y) + ±-d(y,x) 2 



J\(x) = argmin 
yex 

and put Jo(x) = x, for all x G X. This definition in metric spaces with no linear 
structure first appeared in [15]. The mapping J\ is well defined for all A > 0, see 
[TBI Lemma 2] and [23] Theorem 1.8]. 

The Moreau-Yosida resolvents are essential in the proof of existence of harmonic 
maps. Indeed, the energy functional is convex and lsc on a suitable CAT(O) space Y 
of £ 2 -mappings, and J\(y), with an arbitrary y G Y, is shown to strongly converge 
to a minimizer of the energy functional (to a harmonic map), as A — >■ oo. For the 
details, see [HI TH Ml 02] ■ 

In spite of the significance of the convergence of J\ , it is more desirable to estab- 
lish convergence of the corresponding (gradient flow) semigroup (T\) x>0 , which is 
given as 



(4) T A x = lim J A (x), x G dom/. 

The limit in (|4]) is uniform with respect to A on bounded subintervals of [0,oo), 
and (T\) X>Q is a strongly continuous semigroup of nonexpansive mappings, see 
p~7l Theorem 1.3.13], and [23l Theorem 1.13]. Unfortunately, the semigroup (T\) 
convergences only weakly (see Theorem 11.51 below and the subsequent discussion). 

It is well known that the PPA is a discrete version of the gradient flow semi- 
group Q3J[23]. 

1.4. Main results. Let (X, d) be a complete CAT(O) space and / : X — >■ (— oo, oo] 
be a lsc convex function. The proximal point algorithm starting at a point xq E X 
generates in the n-th step, n G N, the point 



(5) x n = argmin 

vex 



Recall that x n is well-defined. By Guler's result of [13], only weak convergence of 
(x„) to a minimizer of / can be expected in general. 
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The main results of the present paper are the following two theorems. 

Theorem 1.4. Let (X,d) be a complete CAT(O) space, and f : X — > (— oo, oo] be 
a convex Isc function. Suppose that f has a minimizer, that is, there exists a point 
c £ X such that 



Then, for an arbitrary starting point xq £ X, and a sequence of positive reals (A„) 
such that A n = oo, the sequence (x n ) C X defined by ([5]) weakly converges to 
a minimizer of f. 

The proof of Theorem ll.4l is given in Section[3J It is based on Fejer monotonicity, 
whereas the classical Hilbert space proofs do not use this feature [13j [22l [25] . 
However, it was later observed by Combettes that the PPA sequence is Fejer mono- 
tone [9]. 

We also study an object which is closely related to the PPA, namely, the gradient 
flow, and obtain the following result. 

Theorem 1.5. Let X be a complete CAT(O) space, and f : X — > (— oo, oo] be Isc 
convex. Assume that f attains its minimum on X. Then, given a starting point 
x £ dom /, the gradient flow T\x defined in ((4]) weakly converges to a minimizer of 
f, as A — > oo. 

In [TTJ p. 24], the author discusses the following problem. If there is a sequence 
(A„) C (0, oo) such that A„ — > oo, and that the sequence (T\ n x) n is bounded, is 
it then the case that (T\x) converges (strongly) to a minimizer of /, as A — > oo? 
The answer is no: by Fejer monotonicity we know that the sequence (T\ n x) n is 
bounded, see Proposition I2.3t|m)) below, however Baillon's example [T] shows that 
there is a semigroup which converges weakly to a minimizer, but fails to converge 
strongly. For the details, see the proof of Theorem II .51 in Section [3J 

We end the Introduction with the following two remarks. 

Remark 1.6 (Rate of the convergence). As we shall see in the proof of Theorem l 1 ,4[ 
more precisely in <JSj> , for any n £ N, we have 



where if is a positive constant. In other words, rate of the weak convergence is 



Remark 1.7 (Strong convergence). In order to ensure strong convergence in The- 
orems 11.41 and 11.51 we need to impose additional assumptions on the data. For 
instance, it would be sufficient to require the underlying space to be locally com- 
pact. Or, we may require the function / to be uniformly convex on bounded subsets 
of its domain, see [3J Theorem 27.1(iii)] and [23J Lemma 1.7]. Recall the definition. 
Let </> : [0, oo) — > [0, oo] be a non-decreasing function vanishing only at 0. A function 
h : X — > (— oo, oo] is uniformly convex on a set A C dom / with modulus <fi if, 

h (ax + (1 — a)y) + a(l — a)4> (d(x, y)) < ah(x) + (1 — a)h(y), 

for any x,y £ A and any a £ [0, 1]. We refer the reader to Remark ll.7l for the proof 
that uniform convexity of the function implies strong convergence in Theorems 11.41 
andO 



f(c) = mff(x). 



f(x n ) - inf / < 



K 
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2. Preliminaries 

We first recall basic notation concerning CAT(O) spaces. For further details on 
the subject, the reader is referred to |5]. Let (X, d) be a CAT(O) space. Having two 
points x,y £ X, we denote the geodesic segment from x to y by [x,y]. We usually 
do not distinguish between a geodesic and its geodesic segment, as no confusion can 
arise. A set C C X is convex if x, y £ C implies [x, y] C C. For a point z £ [x,y], 
we write z = tx + (1 — t)y, where t = d{z, y)/d(x, y). 

Given x,y, z £ X, the symbol a(y,x, z) denotes the (Alexandrov) angle between 
the geodesies [x,y] and [x,z]. The corresponding angle in the comparison triangle 
is denoted a'(y,x,z). 

2.1. Metric projections. For any metric space (X, d) and C C X, define the 
distance function by 

dn(x) = inf d(x, c), x £ X. 

cec 

Note that the function dc is convex and continuous provided X is CAT(O) and C 
is convex and complete [H Cor. 2.5, p. 178]. 

Proposition 2.1. Let (X,d) be a CAT(O) space and C C X be complete and 
convex. Then: 

(i) For every x £ X, there exists a unique point Pc{%) £ C such that 

d(x,P c (x))=d c {x). 

(ii) If ye [x,P c (x)\ , then P c {x) = P c (y). 

(hi) If x £ X \ C and y £ C such that Pc{x) ^ y, then a (x, Pc{x), y) > f . 
(iv) The mapping Pq is a non- expansive retraction from X onto C. 

Proof. See [&, Proposition 2.4, p.176]. □ 
The mapping Pc is called the metric projection onto C. 

2.2. Weak convergence. A notion of weak convergence in CAT(O) spaces was first 
introduced by Jiirgen Jost in [14l Definition 2.7]. Sosov later defined his ip- and <f>- 
convergences, both generalizing Hilbert space weak convergence into geodesic metric 
spaces [26] . Recently, Kirk and Panyanak extended Lim's A-convergence [21] into 
CAT(O) spaces 19] and finally, Espinola and Fernandez-Leon [TT] modified Sosov's 
(^-convergence to obtain an equivalent formulation of A-convergence in CAT(O) 
spaces. This is, however, exactly the original weak convergence due to Jost [14] . 

Let A be a complete CAT(O) space. Suppose (x„) Clisa bounded sequence 
and define its asymptotic radius about a given point x £ X as 

r(x n ,x) =limsupd(x n ,x), 

n— ^oo 

and asymptotic radius as 

r(x n ) = inf r(x n ,x). 
xex 

Further, we say that a point x £ X is the asymptotic center of (x n ) if 

r(x n ,x) = r(x n ). 

Since A is a complete CAT(O) space we know that the asymptotic center of (x n ) 
exists and is unique [TU1 Proposition 7]. 
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We shall say that (x n ) C X weakly converges to a point x G X if x is the 
asymptotic center of each subsequence of (x n ). We use the notation x n A x. Clearly, 
if x n — >• x, then x n A n. 

If there is a subsequence (i„J of (x n ) such that i„ t A z for some z £ X, we 
say that z is a weafc cluster 'point of the sequence (i n ). Each bounded sequence has 
a weak cluster point, see [UJ Theorem 2.1], or [THl p. 3690]. 

Proposition 2.2. >1 bounded sequence (x n ) C X weakly converges to a point x G X 
if and only if, for any geodesic 7 : [0, 1] — > X with x G 7, we have 

d (x, P 7 (x„)) — ► 0, as n — >• 00. 



Proof. See jTTJ Proposition 5.2]. □ 

We shall say that a function / : X — > {— 00, 00] is weakly Isc at a given point 
x G X if 

liminf /(i„) > /(x) 

n— >oo 

for each sequence i„4i. 

It is easy to verify that in Hilbert spaces the classical weak convergence coincides 
with the weak convergence defined above. 

2.3. Fejer monotone sequences. A sequence (x n ) C X is Fejer monotone with 
respect to C if, for any c G C, 

d(x n+ i,c) < d(x n ,c), n e N. 

Proposition 2.3. Let (x n ) G X be a Fejer monotone sequence with respect to C. 
Then: 

(i) (x n ) is bounded, 

(ii) dc(x n +i) < dc{x n ) for each n £ N. 

(hi) (x n ) weakly converges to some x G C if and only if all weak cluster points 

of {x n ) belong to C. 
(iv) (x n ) converges to some x G C if and only if d(x n ,C) — > 0. 

Proof. See [3 Proposition 3.3]. □ 

3. Proofs of the theorems 

We begin with a useful lemma, whose proof follows easily from the fact that 
a closed convex subset of a complete CAT(O) space is (sequentially) weakly closed 
Lemma 3.1]. The symbol co A stands for the closed convex hull of a subset A of 
a CAT(O) space X, that is, the smallest closed convex subset of X containing A. 

Lemma 3.1. Let X be a complete CAT(O) space. If f : X — > (—00, 00] a Isc convex 
function, then it is weakly Isc. 

Proof. By contradiction. Let (x n ) C X, x G X and Suppose that 

liminf f(x„) < f(x). 

n— >oo 

That is, there exist a subsequence {x nk ), index ko G N, and 5 > such that 
f{xn k ) < f{x) — S for all k > ko. By lower semicontinuity and convexity of /, we 
get 

f{y) < f(x) - 6 
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for all y G co{x nk ■ k > ko}. But this, through [5] Lemma 3.1], yields a contradiction 

w i — i 
to X n — > X. U 

Now we give the proof of Theorem 11.41 
Proof of Theorem \l-4\ The set of minimizers of /, 

C=\c£X:f(c)= mf f(x) 

is by the assumptions nonempty. Without loss of generality suppose f\ c = 0. We 
first show that the sequence (x n ) is Fejer monotone with respect to C. That is, for 
a given c 6 C, we are to verify 

(6) d{x n , c) < d(x n -i, c), for all n e N. 
Denote the /(a: n )-sublevel set of / by 

A = {xeX:f(x)<f(x n )}. 

Then c £ A, and x n = PA(x n -i) , which together, along with Proposition yield 
a (x n -i,x n ,c) > ir/2. Hence we get <j6j) . 

In the next step we show that the inequality 

(7) \kf(xk) < ^d(x k -!,c) 2 - \d{x k ,cf 

holds for any c £ C and fcgN. Indeed, from the definition of x k we have 
^kf{x k ) + id(a; fc ,a; fe _i) 2 < \ k f{p) + x fe _i) 2 , 

for any pel. In particular, let t £ [0, 1) and p t = tx k + (1 — t)c, then 
^d(x k ,x k ^i) 2 - ^d(p t ,x k -i) 2 < A fc [f(p t ) ~ f(xk)] 

Applying ([2]) to the above inequality gives 



A fc (l - t) [/(c) - f{x k )\ > - Iridic, x^f 

+ ^—^-d(x k ,x k -i) 2 

*(1 -*) w n2 

H ^ d(x k ,c) , 

or, after taking into account that /(c) = 0, and t ^ 1, 

hf(xk) < ^(c^fe^x) 2 - ^d{x kl x k _i) 2 - ^d(x k ,c) 2 . 
Since this inequality holds for all t € [0, 1), we conclude that 

^ k f{x k ) < ^d (c, Xfc-i) 2 - (x k , x fe _i) 2 - irf (x fe , c) 2 . 

and therefore (O holds true. 

From ([7]) and from the monotonicity of (f(x n )) n , we now obtain 

n n 

2/(ar n ) ^ A fc < 2 ^ A fe /(a; fe ) < d(a;o, c) 2 - d(ar„, c) 2 , 
fc=i fe=i 
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and 

/ o \ ft \ d{x ,c) 2 

(8) f{Xn) < r~- 

By the assumptions, the right hand side of the last inequality goes to 0, as 
n — > oo. We thence have that (x n ) is a minimizing sequence, that is, f(x n ) — > as 
n — > oo. 

Assume now that a point Xoo G X is a weak cluster point of that is, 

there exists a subsequence (x„ fc ) of (x„) such that (x nk ) — ^ x^. Since / is lsc, 
and therefore weakly lsc by Lemma 13 - 1 1 we get / (xoo) = 0, and hence x^ G C. 
Applying Proposition l2.3lfnT| finally gives x n — » x^, as n — > oo. This finishes the 
proof. □ 



The proof of Thcorcm ll.5l is similar to that of Theorem ll.4l The main ingredients 
come from 231. 



Proof of Theorem \1.5[ Let x G dom/ be a (starting) point. We first observe that 
(T\x) x>Q weakly converges to a point z G X, as A — > oo, if and only if, for any 
sequence (A„) c [0, oo) such that A„ — > oo, the sequence (T\ n x) n weakly converges 
to z, as n — > oo. Take therefore an arbitrary sequence (A„) C [0, oo) such that 
A„ — > oo. Denote 

C = jc G X : /(c) = inf f(x] 

The set C is nonempty, and {T\ n x) is Fejer monotone with respect to C by [23l 
Theorem 2.38]. Furthermore, the sequence (T\ n x) is minimizing, that is, 

lim f(T Xn x) = inf f(y), 

by p3) Theorem 2.39]. The same argument as above finishes the proof: / is weakly 
lsc (Lemma 13. and it follows that all weak cluster points of (T\ n x) lie in C. 
Finally, apply Proposition I2.3tfiii|) . □ 

Proof of Remark \1.7\ Here we will show that if the function / is uniformly convex 
on bounded subsets of dom / with modulus <f>, we have even strong convergence in 
Theorems 11.41 and 11.51 Indeed, let (x n ) be the sequence from Theorem 11.41 We 
already know that it is bounded, and hence 

(9) -<?!> (d (x n ,x m )) < ^f(x n ) + ^f{x m ) - / 

for any m, n G N. We also already know that (x n ) is a minimizing sequence for /, 
and since <j> vanishes only at 0, we get by ([9]) that (x n ) is Cauchy. 

The case of Theorem 11.51 is similar. □ 
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